tg-me.com/opendatascience/2251
Last Update:
New code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed RL with GRPO+ and iterative context lengthening. Trained on ~24K coding problems (TACO-Verified, PrimeIntellect SYNTHETIC-1, LCB v5), it improves Pass@1 on LiveCodeBench v5 to 60.6%, +7.6% over base and on par with OpenAI o3-mini.
- GRPO+: removes KL/entropy loss for stability; adds offline difficulty filtering, DAPO-inspired loss masking, and reward clipping.
- Iterative context scaling: 16Kβ32Kβ64K generalization with improved long-context reasoning.
Eval: Strong results on LiveCodeBench, Codeforces, HumanEval+
Open weights
https://huggingface.co/agentica-org/DeepCoder-14B-Preview
@opendatascience